Recency is good: expanding with fresh news improves event detection in Twitter

نویسندگان

  • Sasa Petrovic
  • Miles Osborne
  • Victor Lavrenko
چکیده

Twitter is a popular microblogging site that is a good source of real-time information. Detecting events in Twitter is an ongoing research effort and a fundamental task is clustering tweets according to which (news) event they describe. Document expansion can improve this clustering, especially for Twitter, given that tweets are short. While document expansion using external corpora has been around for years [1], all previous work treats the external corpus as temporally static. We are the first to treat the external corpus (newswire articles in this case) as a time-synchronous stream, expanding tweets with words found in similar, temporally aligned newswire articles. Tweets are expanded with terms from the most similar newswire document, where the terms are weighted by the cosine similarity between the tweet and the newswire document [2]. Using the tweet corpus compiled by [3], and newswire data from the same time period, coming from eight major newswire sources (Reuters, CNN, BBC, New York Times, Google News, Guardian, Wired, The Register), we find that using timely newswire for expansion material improves event detection for Twitter more than using older newswire for the same purpose. BODY Expanding tweets with fresh and with stale newswire improves event detection by 21% and 17% respectively, compared to not using expansion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature-Rich Segment-Based News Event Detection on Twitter

Event detection on Twitter is an important and challenging research topic. On the one hand, Twitter provides first-hand information and fast broadcasting. On the other, challenges include short and noisy content, big volume data and fast-changing topics. Dominant approaches for Twitter event detection model events by clustering tweets, words or segments, while segments have been proven to be ad...

متن کامل

The early bird catches the term: combining twitter and news data for event detection and situational awareness

BACKGROUND Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. They can therefore be highly useful for event detection and situational awareness applications. RESULTS In this paper we apply customised filtering techniques to existing bio-surveillance algorithms to detect...

متن کامل

Report on Time is of the Essence: Improving Recency Ranking Using Twitter Data

Recency content has become a critical issue in information retrieval. Efficient retrieval of fresh and relevant documents has not been fully overcome yet and it is an increasing topic of interest in academic and commercial research. So far, web search engines manage to deal reasonable well with the classical Navigational, Transactional and Informational queries. However with a more granular cat...

متن کامل

Real-time Detection of Content Polluters in Partially Observable Twitter Networks

Content polluters, or bots that hijack a conversation for political or advertising purposes are a known problem for event prediction, election forecasting and when distinguishing real news from fake news in social media data. Identifying this type of bot is particularly challenging, with state-of-the-art methods utilising large volumes of network data as features for machine learning models. Su...

متن کامل

A Survey of Techniques for Event Detection in Twitter

Twitter is among the fastest-growing microblogging and online social networking services. Messages posted on Twitter (tweets) have been reporting everything from daily life stories to the latest local and global news and events. Monitoring and analyzing this rich and continuous user-generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire act...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • TinyToCS

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2013